Goto

Collaborating Authors

 fraction 0


Reward Transfer from Inverse Reinforcement Learning: A Coupled Minimax Approach

arXiv.org Machine Learning

Expert demonstrations, such as those from car drivers, help navigate environments with unknown rewards, but are often collected in controlled settings, such as closed-course test tracks, while learned control policies must be deployed in new environments, such as city streets. We can imitate experts to perform well in the same source environment where demonstrations are observed, and we may even use inverse reinforcement learning (IRL) to improve on simple behavior cloning (Ng and Russell, 2000; Abbeel and Ng, 2004; Ziebart et al., 2008; Fu et al., 2018; Geng et al., 2020). But the target environment may have a different transition law, discount factor, or soft-control regularization. For this, IRL is crucial: we can learn a reward from demonstrations in the source environment and transfer it to the target environment, learning a policy that optimizes the same reward function in a new setting (Fu et al., 2018; Schlaginhaufen and Kamgarpour, 2024). In this paper, we characterize how well this transfer can be done and which approaches are preferable. In particular, we show the value in a coupled approach that takes the target environment into account even when learning from the source. In ordinary offline control, the Bellman equation uses a known reward, so the main statistical error comes from target transitions.


7fd3b80fb1884e2927df46a7139bb8bf-Supplemental.pdf

Neural Information Processing Systems

The IDs of the 10 datasets used in this work, as well as the number of examples and features, are provided in Table 1 in the main manuscript. All of the datasets correspond to binary classification problems, with varying degrees of class imbalance. While the prediction is always performed in the logarithmic domain, when evaluating the models we transform both the labels and the model predictions back into their original domain. The loss function used for training and evaluation is the standard root mean-squared error (sklearn.metrics.mean_squared_error). We download the raw data programmatically using the Kaggle API, which produces the filetrain.tsv.


MathCAMPS: Fine-grained Synthesis of Mathematical Problems From Human Curricula

arXiv.org Artificial Intelligence

Mathematical problem solving is an important skill for Large Language Models (LLMs), both as an important capability and a proxy for a range of reasoning abilities. Existing benchmarks probe a diverse set of skills, but they yield aggregate accuracy metrics, obscuring specific abilities or weaknesses. Furthermore, they are difficult to extend with new problems, risking data contamination over time. To address these challenges, we propose MathCAMPS: a method to synthesize high-quality mathematical problems at scale, grounded on 44 fine-grained "standards" from the Mathematics Common Core (CC) Standard for K-8 grades. We encode each standard in a formal grammar, allowing us to sample diverse symbolic problems and their answers. We then use LLMs to realize the symbolic problems into word problems. We propose a cycle-consistency method for validating problem faithfulness. Finally, we derive follow-up questions from symbolic structures and convert them into follow-up word problems - a novel task of mathematical dialogue that probes for robustness in understanding. Experiments on 23 LLMs show surprising failures even in the strongest models (in particular when asked simple follow-up questions). Moreover, we evaluate training checkpoints of Pythia 12B on MathCAMPS, allowing us to analyze when particular mathematical skills develop during its training. Our framework enables the community to reproduce and extend our pipeline for a fraction of the typical cost of building new high-quality datasets.


Accelerating gradient-based topology optimization design with dual-model neural networks

arXiv.org Artificial Intelligence

Topology optimization (TO) is a common technique used in free-form designs. However, conventional TO-based design approaches suffer from high computational cost due to the need for repetitive forward calculations and/or sensitivity analysis, which are typically done using high-dimensional simulations such as Finite Element Analysis (FEA). In this work, neural networks are used as efficient surrogate models for forward and sensitivity calculations in order to greatly accelerate the design process of topology optimization. To improve the accuracy of sensitivity analyses, dual-model neural networks that are trained with both forward and sensitivity data are constructed and are integrated into the Solid Isotropic Material with Penalization (SIMP) method to replace FEA. The performance of the accelerated SIMP method is demonstrated on two benchmark design problems namely minimum compliance design and metamaterial design. The efficiency gained in the problem with size of 64x64 is 137 times in forward calculation and 74 times in sensitivity analysis. In addition, effective data generation methods suitable for TO designs are investigated and developed, which lead to a great saving in training time. In both benchmark design problems, a design accuracy of 95% can be achieved with only around 2000 training data.


Learning Latent Dynamics for Planning from Pixels

arXiv.org Artificial Intelligence

Planning has been very successful for control tasks with known environment dynamics. To leverage planning in unknown environments, the agent needs to learn the dynamics from interactions with the world. However, learning dynamics models that are accurate enough for planning has been a long-standing challenge, especially in image-based domains. We propose the Deep Planning Network (PlaNet), a purely model-based agent that learns the environment dynamics from pixels and chooses actions through online planning in latent space. To achieve high performance, the dynamics model must accurately predict the rewards ahead for multiple time steps. We approach this problem using a latent dynamics model with both deterministic and stochastic transition function and a generalized variational inference objective that we name latent overshooting. Using only pixel observations, our agent solves continuous control tasks with contact dynamics, partial observability, and sparse rewards. PlaNet uses significantly fewer episodes and reaches final performance close to and sometimes higher than top model-free algorithms.


Using deep learning for comprehensive, personalized forecasting of Alzheimer's Disease progression

arXiv.org Machine Learning

A patient is more than one number, yet most approaches to machine learning from electronic health data can only predict a single endpoint. Here, we present an alternative -- using unsupervised deep learning to simulate detailed patient trajectories. We use data comprising 18-month longitudinal trajectories of 42 clinical variables from 1908 patients with Mild Cognitive Impairment (MCI) or Alzheimer's Disease (AD) to train a model for personalized forecasting of disease progression. Our model simulates the evolution of each sub-component of cognitive exams, laboratory tests, and their associations with baseline clinical characteristics, generating both predictions and their confidence intervals. Even though it is not trained to predict changes in disease severity, our unsupervised model predicts changes in total ADAS-Cog scores with the same accuracy as specifically trained supervised models. We show how simulations can be used to interpret our model and demonstrate how to create synthetic control arm data for AD clinical trials. Our model's ability to simultaneously predict dozens of characteristics of a patient at any point in the future is a crucial step forward in computational precision medicine.


Conditional molecular design with deep generative models

arXiv.org Machine Learning

Although machine learning has been successfully used to propose novel molecules that satisfy desired properties, it is still challenging to explore a large chemical space efficiently. In this paper, we present a conditional molecular design method that facilitates generating new molecules with desired properties. The proposed model, which simultaneously performs both property prediction and molecule generation, is built as a semi-supervised variational autoencoder trained on a set of existing molecules with only a partial annotation. We generate new molecules with desired properties by sampling from the generative distribution estimated by the model. We demonstrate the effectiveness of the proposed model by evaluating it on drug-like molecules. The model improves the performance of property prediction by exploiting unlabeled molecules, and efficiently generates novel molecules fulfilling various target conditions.